Real-time audio source separation by delay and attenuation compensation in the time domain

ABSTRACT

A system is provided for separating two audio channels recorded by an array of microphones. The system includes a calibration module for normalizing gain levels between a plurality of channels on each of a plurality of date frames, wherein each data frame is expressed in terms of time. The system further includes a delay parameter estimation module for accepting an output comprising the normalized channels, and estimating a delay parameter for a plurality of data frame sizes over a plurality of lag times, and sorting delays to generate corresponding source separated outputs.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention is related to processing audio data, andmore particularly to audio source separation in anechoic environments.

[0003] 2. Discussion of Prior Art

[0004] There is increased interest in using microphone arrays in avariety of audio source separation and consequently speech processingapplications. Small arrays of microphones have improved on singlemicrophone systems in speech separation and directional detection ofsources for hands free communication and a variety of other speechenhancement and audio source separation applications. Blind orparametric source separation approaches have been applied to distinguishbetween input from different microphones but with limited success.Challenges such as reverberation, noise, and acoustical echoes stillplague many approaches to blind separation of audio signals.

[0005] One method for automatically compensating for attenuation due todifferences in the calibration of the microphones has been attempted,which implements a deconvolution stage on the order of about a thousandtaps. This is computationally expensive and may be difficult toimplement in real-time.

[0006] A mixing model has been proposed wherein a decorrelationcriterion is determined for integer delays, therefore the approachassumes that the distance between the microphones is less than thedistance from the sources. However, such assumptions about sources beingfar-field may not hold well, and thus the model may not be a goodapproximation of the environment. Another proposed refinement to themixing model includes higher order tap coefficients. The overall modelcorresponds to a constrained physical situation.

[0007] Another set of related spatial filtering techniques are antennaarray processing techniques. Such techniques assume information aboutthe microphone array layout as a given. For example, a delay andattenuation compensation (DAC) separation approach does not necessarilymake this assumption, however weaker information such as the distance ora bound on the distance between sensors can help during a parameterestimation phase.

[0008] Still other proposed techniques use robust beamforming. Adaptivebeamformers assume a known direction of arrival. Beamforming can beapplied to source separation to deconvolve source estimates. Varioussource separation approaches have attempt to combine independentcomponent analysis (ICA) or blind source separation (BSS) and elementsof a beamformer to improve the performance of ICA/BSS techniques.However, no known system or method exists for real-time sourceseparation by delay and attenuation compensation.

[0009] Therefore, a need exists for a system and method of real-timesource separation by delay and attenuation compensation in a timedomain.

SUMMARY OF THE INVENTION

[0010] According to an embodiment of the present invention, a method isprovided for separating at least two audio channels recorded using anarray of at least two microphones. The method equalizes variances of afirst channel and a second channel on a current data frame, recursivelyexpresses means and variances of mixtures, and normalizes the secondchannel to a variance level substantially similar to a variance of thefirst channel.

[0011] On a current block of m data samples x_(j) (t), 1≦t≦m 1≦j≦2, andindex k, a current block mean {overscore (x)}_(j) can be determinedaccording to:${\overset{\_}{x}}_{j} = {\frac{1}{m}{\sum\limits_{t = 1}^{m}{x_{j}(t)}}}$

[0012] A running mean {overscore (x)}_(j) ^((k−1)) can be updated by:

{overscore (x)} _(j) ^((k))=(1−β){overscore (x)} _(j) ^((k−1))+β{overscore (x)} _(j)

[0013] where β is a learning rate.

[0014] A current block variance Var_(j) is determined according to:${Var}_{j} = {\frac{1}{m}{\sum\limits_{t = 1}^{m}{{{x_{j}(t)} - {\overset{\_}{x}}_{j}^{(k)}}}^{2}}}$

[0015] A running variance v_(j) ^((k−1)) is updated by:

v _(j) ^((k))=(1−β)v _(j) ^((k−1)) +βVar _(j)

[0016] Normalizing the second channel further includes normalizing anaverage energy to be similar to an average energy of the first channelaccording to:${\hat{x}}_{2} = {\sqrt{\frac{\nu_{1}^{(k)}}{\nu_{2}^{(k)}}}x_{2}}$

[0017] The method determines delay parameters by minimizing across-covariance between two sources. The cross-covariance between theoutputs is expanded as:

R _(y) ₁ _(y) ₂ (τ)=R _(x) ₁ _(x) ₁ (d ₁ −d ₂+τ)−R _(x) ₁ _(x) ₂ (d₂−τ)−R _(x) ₁ _(x) ₂ (d ₁+τ)+R _(x) ₂ _(x) ₂ (τ)

[0018] where R_(x) _(i) _(x) _(j) is the cross-correlation between x_(i)and x_(j) , 1≦i, j≦2. The method further includes determiningsub-unit-delayed versions of cross-correlations, wherein the delayparameters are determined for a number of lags L.According to anembodiment of the present invention, a system is provided for separatingtwo audio channels recorded by an array of microphones. The systemincludes a calibration module for normalizing gain levels between aplurality of channels on each of a plurality of date frames, whereineach data frame is expressed in terms of time. The system furtherincludes a delay parameter estimation module for accepting an outputcomprising the normalized channels, and estimating a delay parameter fora plurality of data frame sizes over a plurality of lag times, andsorting delays to generate corresponding source separated outputs.

[0019] The source separated outputs of the delay parameter estimationmodule are output in real-time.

[0020] The calibration module compensates for attenuations at themicrophones.

[0021] The delay parameter determines relative delays of arrival of wavefronts at each microphone.

[0022] According to an embodiment of the present invention, a method isprovided for separating at least two audio channels recorded using anarray of at least two microphone. The method includes constraining amixing model of the at least two audio channels in a time domain todirect path signal components, and defining a plurality of delays withrespect to a midpoint between microphones, wherein delays depend on thedistance between sensors and the speed of sound. The method furtherincludes inverting a mixing matrix, corresponding to the mixing model,in the frequency domain, and compensating for a plurality of truefractional delays and attenuations in the time domain, wherein values ofthe delays and attenuations are determined from an output decorrelationconstraint.

[0023] The method includes estimating a complex filter for eachmicrophone, wherein the complex filters define the mixing model.

[0024] The mixing matrix corresponding to the mixing model comprises twodelay parameters and two parameters corresponding to the speed of sound.

[0025] The output decorrelation constraint is a function of two unknowndelays and unknown scalar coefficients. An attenuation coefficient has avalue substantially equal to one.

[0026] The method imposes a minimum variance criterion for a reverberantcase over all linear filtering combinations of X₁ and X₂.

[0027] According to an embodiment of the present invention, a programstorage device is provided, readable by machine, tangibly embodying aprogram of instructions executable by the machine to perform methodsteps for separating at least two audio channels recorded using an arrayof at least two microphones. The method includes equalizing variances ofa first channel and a second channel on a current data frame,recursively expressing means and variances of mixtures, and normalizingthe second channel to a variance level substantially similar to avariance of the first channel.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028] Preferred embodiments of the present invention will be describedbelow in more detail, with reference to the accompanying drawings:

[0029]FIG. 1a is a diagram of a system for executing code according toan embodiment of the present invention;

[0030]FIG. 1b is a diagram of a system for separating mixed inputaccording to an embodiment of the present invention;

[0031]FIG. 2 shows an impulse response for echoic data set according toan embodiment of the present invention;

[0032]FIG. 3 shows a Segmental signal-to-noise ratio (SNR) separationresults as a function of the difference in angles of arrival foranechoic data set according to an embodiment of the present invention;

[0033]FIG. 4 shows a segmental SNR separation results as a function ofthe higher angle of one of the two sources for anechoic data setaccording to an embodiment of the present invention;

[0034]FIG. 5 shows a segmental SNR separation results as a function ofthe difference in angles of arrival for echoic data Set according to anembodiment of the present invention;

[0035]FIG. 6 shows a segmental SNR separation results as a function ofthe higher angle of one of the two sources for echoic data set accordingto an embodiment of the present invention;

[0036]FIG. 7 illustrates an evolution of absolute and smoothed delayparameters (in samples) as a function of the number of frames processedfor an anechoic example according to an embodiment of the presentinvention;

[0037]FIG. 8 shows the evolution of the instantaneous SNR for theexample in FIG. 6 according to an embodiment of the present invention;and

[0038]FIG. 9 is a flow chart of calibrating inputs and determiningdelays according to an embodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0039] The present invention provides a system and method for separatingtwo or more audio signals recorded using an array of microphonesassuming an anechoic mixture model. The complexity and performancefactors of an embodiment of the present invention has been measured. Onewith ordinary skill in the art will appreciate that various otherembodiments can be built upon these results.

[0040] Although elements of the present invention are derived from blindsource separation principles, the system and method implement ananechoic propagation model to reduce the complexity of the mixing modeland make it possible to effectively identify and invert a mixing processusing second ordered statistics. For sources far away from themicrophone array, for example, greater than one meter, the model can besimplified to depend on just a few parameters. According to anembodiment of the present invention, these parameters include relativedelays in the arrival of wave fronts and attenuations at themicrophones. The method estimates the parameters of a mixture tocompensate for the true values according to a delay and attenuationcompensation (DAC) method.

[0041] It is to be understood that the present invention may beimplemented in various forms of hardware, software, firmware, specialpurpose processors, or a combination thereof. In one embodiment, thepresent invention may be implemented in software as an applicationprogram tangibly embodied on a program storage device. The applicationprogram may be uploaded to, and executed by, a machine comprising anysuitable architecture. Preferably, the machine is implemented on acomputer platform having hardware such as one or more central processingunits (CPU), a random access memory (RAM), and input/output (I/O)interface(s). The computer platform also includes an operating systemand micro instruction code. The various processes and functionsdescribed herein may either be part of the micro instruction code orpart of the application program (or a combination thereof) which isexecuted via the operating system. In addition, various other peripheraldevices may be connected to the computer platform such as an additionaldata storage device and a printing device.

[0042] It is to be further understood that, because some of theconstituent system components and method steps depicted in theaccompanying figures may be implemented in software, the actualconnections between the system components (or the process steps) maydiffer depending upon the manner in which the present invention isprogrammed. Given the teachings of the present invention providedherein, one of ordinary skill in the related art will be able tocontemplate these and similar implementations or configurations of thepresent invention.

[0043] According to an embodiment of the present invention, to achieve areal-time implementation of a source separation model, an estimation ofthe direction of arrival can be determined based on a cross-covariance.Further, variations in the microphones, such as differences in gain, canbe accounted for. The system and method have been evaluated using asegmental signal-to-noise ratio (SNR) measure for a large collection ofdata collected in both anechoic and echoic environments.

[0044] Referring to FIG. 1a, according to an embodiment of the presentinvention, a computer system 101 for implementing the present inventioncan comprise, inter alia, a central processing unit (CPU) 102, a memory103 and an input/output (I/O) interface 104. The computer system 101 canbe coupled through the I/O interface 104 to a display 105 and variousinput devices 106 such as a mouse and keyboard. The support circuits caninclude circuits such as cache, power supplies, clock circuits, and acommunications bus. The memory 103 can include random access memory(RAM), read only memory (ROM), disk drive, tape drive, etc., or acombination thereof. As such, the computer system 101 is a generalpurpose computer system that becomes a specific purpose computer systemwhen executing a program of instructions or executable code 107 of thepresent invention. As shown in FIG. 1b, a calibration module 108 anddelay estimation module 109 can be provided in conjunction with acomputer system as hardware or software.

[0045] A general convolutive model for the mixing of two source signalsat two sensors can be written as:

x ₁(t)=h ₁1s ₁(t)+h ₂1s ₂(t)

x ₂(t)=s ₁(t)+s ₂(t)  (1)

[0046] where h_(i) represents unknown relative transfer functions of thefirst sensor versus the second sensor, t is time, wherein t is an indexin the current frame of data, and 1 represents convolution. s₁ and s₂are the source signals.

[0047] With a low complexity source separation method, the treatment ofthe mixing problem can be simplified by considering only direct pathsignal components, rather than using a general convolutive propagationmodel. The component from one source arrives at the sensors with afractional delay between the time of arrival at two closely spacedsensors. The fractional delay is a delay between sensors that is notgenerally an integer multiple of the sampling period and depends on theposition of the source with respect to the array axis and the distancebetween sensors. The DAC mixing model in the time domain can be writtenas the follows:

x ₁(t)=s ₁(t−δ ₁)+c ₁ ∃s ₂(t−δ ₂)

x ₂(t)=c ₂ ∃s ₁(t+δ ₁)+s ₂(t+δ ₂)  (2)

[0048] where: c₁, c₂ are two positive real numbers, accounting fornon-calibrated microphones, and for deviations from the far-fieldassumption. s₁ and s₂ are two sources, and x₁ and x₂ are mixtures at therespective microphones. Equation 2 describes a mixing matrix for themixing model in the time domain, in terms of four parameters, δ₁, δ₂,c₁, and c₂.

[0049] According to an embodiment of the present invention this mixingmatrix is inverted. This can be performed in the frequency domain, andresults in the following time domain solution:

y ₁(t)=h(t,δ ₁,δ₂ ,c ₁ ,c ₂)1(x ₁(t+δ ₂)−c ₁ x ₂(t−δ ₂))

y ₂(t)=h(t,δ ₁,δ₂ ,c ₁ ,c ₂)1(−c ₂ x ₁(t+δ ₁)+x ₂(t−δ ₁))  (3)

[0050] where the convolutive filter h accounts for the division with thedeterminant of the mixing matrix. In practice the criteria above can besimplified to a decorrelation between fractionally delayed sensorrecordings:

y ₁(t)=x ₁(t+d ₁)−c ₁ x ₂(t)

y ₂(t)=c ₂ x ₁(t+d ₂)+x ₂(t)  (4)

[0051] This is possible due to the freedom to shift signals under theassumption of decorrelation at any lag.

[0052] The DAC method performs source separation by compensating for thetrue fractional delays and attenuations in the time domain with valuesdetermined from an output decorrelation constraint:

Ry1y2(τ)=E[y1(t)y2(t+τ)]=0,∀τ  (5)

[0053] as a function of two unknown delays d₁ and d₂ and unknown scalar(attenuation) coefficients c₁ and c₂. E[] is the time average of thequantity between square brackets. Attenuation coefficients c₁ and c₂have values close to one (1) (e.g., c₁≈c₂≈1) under the far-field sourceassumption. This is equivalent to the following criterion:$\begin{matrix}{\left\{ {{\hat{d}}_{1},{\hat{d}}_{2},{\hat{c}}_{1},{\hat{c}}_{2}} \right\} = {\arg \quad \min {\sum\limits_{\tau}{R_{y_{1}y_{2}}(\tau)}}}} & (6)\end{matrix}$

[0054] A generalization of the solution in the reverberant case (e.g.,Equation 1) can be obtained by imposing a minimum variance criterion,for example, argmin_(Gi1;Gi2)Var(Y_(i)−S_(i)) over all linear filteringcombinations of X₁ and X₂:

Y _(i) =G _(i1) X ₁ +G _(i2) X ₂  (7)

[0055] The implementation includes the estimation of complex filters H₁and H₂ defining the mixing model in Equation 1: $\begin{matrix}{{Y(\omega)} = {\frac{1}{H_{1} - H_{2}} \cdot \begin{bmatrix}1 & {- H_{2}} \\1 & H_{1}\end{bmatrix} \cdot X}} & (8)\end{matrix}$

[0056] Complexity and performance characteristics of the simple method,particularly on real environment data can influence decisions for morecomplex approaches to deal with reverberant conditions.

[0057] According to an embodiment of the present invention, the methodcan simplify the delay estimation by dealing with attenuations in acalibration phase and evaluating output decorrelation based on thecovariance of the mixtures. Calibration can be performed online. Thecalibration accounts for dissimilarities in microphones, e.g., neitheridentical nor calibrated off-line.

[0058] Ideally, c₁=c₂=1 under the far-field assumption, and microphoneshave identical gain characteristics. In practice however, it can bedifficult to impose the latter condition. Referring to FIG. 9, an onlinecalibration criterion is provided for making gain levels commensurate ontwo channels, assuming a two microphone array. The variances of channelsare equalized on a current data frame 901. The means and variances ofthe mixtures are recursively expressed 902, and the second channel isnormalized to a variance level substantially similar to the firstchannel 903. On the current block of m data samples x_(j)(t), 1≦t≦m1≦j≦2, and index k, the current block mean {overscore (x)}_(j) can bedetermined, for example, according to:${\overset{\_}{x}}_{j} = {\frac{1}{m}{\sum\limits_{t = 1}^{m}{x_{j}(t)}}}$

[0059] The running mean {overscore (x)}_(j) ^((k−1)) can be updated by,for example:

{overscore (x)} _(j) ^((k))=(1−β){overscore (x)} _(j) ^((k−1))+β{overscore (x)} _(j)

[0060] where β is a learning rate, for example, β=0.1. The current blockvariance Var_(j) can be determined according to, for example:${Var}_{j} = {\frac{1}{m}{\sum\limits_{t = 1}^{m}{{{x_{j}(t)} - {\overset{\_}{x}}_{j}^{(k)}}}^{2}}}$

[0061] The running variance v_(j) ^((k−1)) can be updated by, forexample:

v _(j) ^((k))=(1−β)v _(j) ^((k−1)) +βVar _(j)

[0062] The second channel can be normalized so that its average energyto be similar to that of the first channel:${\hat{x}}_{2} = {\sqrt{\frac{\nu_{1}^{(k)}}{\nu_{2}^{(k)}}}x_{2}}$

[0063] The recursive formulas above have a direct online implementation.Furthermore, the attenuation parameters in Equation 4 can be dropped,simplifying the estimation of delays.

[0064] The cross-covariance between y₁ and y₂, the outputs, can beexpanded as follows:

R _(y) ₁ _(y) ₂ (τ)=R _(x) ₁ _(x) ₁ (d ₁ −d ₂+τ)−R _(x) ₁ _(x) ₂ (d₂−τ)−R _(x) ₁ _(x) ₂ (d ₁+τ)+R _(x) ₂ _(x) ₂ (τ)  (10)

[0065] where R_(x) _(i) _(x) _(j) is the cross-correlation between x_(i)and x_(j), 1≦i, j≦2.

[0066] Delay parameters can be estimated by minimizing this expression904. Note that to determine sub-unit-delayed versions ofcross-correlations, the delay parameters can be determined for a numberof lags L. TABLE 1 Real-time performance on a Pentium III 600 MHz forvarious values of L (number of lags) and window size. m = 512 m = 1024 m= 4096 L = 8  990 ms/s 500 ms/s 200 ms/s L = 10 1050 ms/s 600 ms/s 205ms/s L = 20 1500 ms/s 750 ms/s 260 ms/s

[0067] A real-time application can be implemented as a multi-threadedWindows task on a Pentium III PC. The inputs can come from the auxiliaryinput of the standard PC sound card, while outputs are continuouslystreamed to, for example, headphones. At least one thread performs theI/O of audio data in real time. At least another thread is responsiblefor the analysis, calibration, delay estimation and synthesis of thedemixed signals.

[0068] Calibrated data are fed into the delay parameter estimationmodule, which can use, for example, the Amoeba optimization method astaught by W. H. Press et al. Numerical Recipes in C. CambridgeUniversity Press, 1988, to find a local solution. Delay values areconstrained based on d, thus, the solution is global. Optimization usesthe cost function (Equation 10), wherein an initial simplex can beselected including, for example, three pairs of delays. The initialsimplex is centered at the delays of last data block (d₁+0:05; d₂+0:05),(d₁−0:05; d₂−0:05), and (d₁+0:05; d₂−0:05) (in samples). Solutions(d₁*;d₂*) of the optimization can be smoothed using a learning rate α,the equation can be written as follows:

dj=d _(j) ^(k)=(1−α)·d _(j) ^(k−1) +α·d _(j) *,j=1,2  (11)

[0069] Delays can be sorted to insure stability to the permutationproblem. The correspondence between delays and sources is unique whensources are not symmetrical with respect to the receiver axis. Thus, thesorted delays can be used to directly generate separated outputs. FIG. 5presents performance measurements with this implementation.

[0070] According to an embodiment of the present invention, an importantcharacteristic of the DAC approach is the artifact-free nature of theoutputs.

[0071] A method implementing the present invention was evaluated on realdata recorded in an anechoic room and in a strongly echoic environment.As shown in FIG. 2, the measured impulse response for the echoicenvironment revealed a reverberation time of about 500 msec.

[0072] The real-time method was successful in separating voices fromanechoic mixtures, even when sources had similar spectral powercharacteristics. The method generally separates at least one voice inechoic voice mixtures, while achieving about three to four-dB segmentalSNR improvement on average. A frame size of 512 samples was chosen.

[0073] For anechoic data sets, FIG. 3 shows a segmental SNR separationas a function of angle of arrival of a wave front (data set). FIG. 4shows a segmental SNR separation results as a function of a two sourcesfor anechoic data at different angles.

[0074] Results for echoic data sets are shown in FIGS. 5 and 6. FIG. 5shows a segmental SNR separation results as a function of different inangles of arrival for echoic data set. FIG. 6 shows a segmental SNRseparation results as a function of a higher angle of one of the twosources for echoic data set.

[0075] The delay estimation method converges close to the true delayvalues provided voice is present after processing only about 150-200milliseconds of anechoic data or about 2500 samples at 16 kHz samplingfrequency. FIGS. 7 and 8 exemplify the convergence and variation in thedelay estimates and the instantaneous SNR as an online method progressesas a function of the number of data frames processed.

[0076] The present invention has been tested on more than one thousandcombinations of voices recorded in real anechoic and echoicenvironments. The performance of the system is good on anechoic data.Although the method is designed for anechoic environments, itscomplexity and performance on real data represent a basis for designingmore complex approaches to deal with reverberant environments.

[0077] Having described embodiments for a method of audio sourceseparation by delay and attenuation compensation, it is noted thatmodifications and variations can be made by persons skilled in the artin light of the above teachings. It is therefore to be understood thatchanges may be made in the particular embodiments of the inventiondisclosed which are within the scope and spirit of the invention asdefined by the appended claims. Having thus described the invention withthe details and particularity required by the patent laws, what isclaimed and desired protected by Letters Patent is set forth in theappended claims.

What is claimed is:
 1. A method for separating at least two audiochannels recorded using an array of at least two microphones comprisingthe steps of: equalizing variances of a first channel and a secondchannel on a current data frame; recursively expressing means andvariances of mixtures; and normalizing the second channel to a variancelevel substantially similar to a variance of the first channel.
 2. Themethod of claim 1, wherein on a current block of m data samples x_(j)(t), 1≦t≦m 1≦j≦2, and index k, a current block mean {overscore (x)}_(j)can be determined according to:${\overset{\_}{x}}_{j} = {\frac{1}{m}{\sum\limits_{t = 1}^{m}{x_{j}(t)}}}$


3. The method of claim 1, wherein a running mean {overscore (x)}_(j)^((k−1)) can be updated by: {overscore (x)} _(j) ^((k))=(1−β){overscore(x)} _(j) ^((k−1)) +β{overscore (x)} _(j) where β is a learning rate. 4.The method of claim 1, wherein a current block variance Var_(j) isdetermined according to:${Var}_{j} = {\frac{1}{m}{\sum\limits_{t = 1}^{m}{{{x_{j}(t)} - {\overset{\_}{x}}_{j}^{(k)}}}^{2}}}$


5. The method of claim 1, wherein a running variance v_(j) ^((k−1)) isupdated by: v _(j) ^((k))=(1−β)v _(j) ^((k−1)) +βVar _(j)
 6. The methodof claim 1, wherein the step of normalizing the second channel furthercomprises normalizing an average energy to be similar to an averageenergy of the first channel according to:${\hat{x}}_{2} = {\sqrt{\frac{\nu_{1}^{(k)}}{\nu_{2}^{(k)}}}x_{2}}$


7. The method of claim 1, further comprising the step of determiningdelay parameters by minimizing a cross-covariance between two outputs.8. The method of claim 7, wherein the cross-covariance between theoutputs is expanded as: R _(y) ₁ _(y) ₂ (τ)=R _(x) ₁ _(x) ₁ (d ₁ −d₂+τ)−R _(x) ₁ _(x) ₂ (d ₂−τ)−R _(x) ₁ _(x) ₂ (d ₁+τ)+R _(x) ₂ _(x) ₂ (τ)where R_(x) _(i) _(x) _(j) is the cross-correlation between x_(i) andx_(j) , 1≦i, j≦2.
 9. The method of claim 7, further comprising the stepof determining sub-unit-delayed versions of cross-correlations, whereinthe delay parameters are determined for a number of lags L.
 10. A systemfor separating two audio channels recorded by an array of microphonescomprising: a calibration module for normalizing gain levels between aplurality of channels on each of a plurality of date frames, whereineach data frame is expressed in terms of time; and a delay parameterestimation module for accepting an output comprising the normalizedchannels, and estimating a delay parameter for a plurality of data framesizes over a plurality of lag times, and sorting delays to generatecorresponding source separated outputs.
 11. The system of claim 10,wherein the source separated outputs of the delay parameter estimationmodule are output in real-time.
 12. The system of claim 10, wherein thecalibration module compensates for attenuations at the microphones. 13.The system of claim 10, wherein the delay parameter determines relativedelays of arrival of wave fronts at each microphone.
 14. A method forseparating at least two audio channels recorded using an array of atleast two microphones comprising the steps of: constraining a mixingmodel of the at least two audio channels in a time domain to direct pathsignal components; defining a plurality of delays with respect to amidpoint between microphones, wherein delays depend on the distancebetween sensors and the speed of sound; inverting a mixing matrix,corresponding to the mixing model, in the frequency domain; andcompensating for a plurality of true fractional delays and attenuationsin the time domain, wherein values of the delays and attenuations aredetermined from an output decorrelation constraint.
 15. The method ofclaim 14, further comprising the step of estimating a complex filter foreach microphone, wherein the complex filters define the mixing model.16. The method of claim 14, wherein the mixing matrix corresponding tothe mixing model comprises two delay parameters and two parameterscorresponding to the speed of sound.
 17. The method of claim 14, whereinthe output decorrelation constraint is a function of two unknown delaysand unknown scalar coefficients.
 18. The method of claim 17, wherein theunknown scalar coefficients are attenuation coefficients substantiallyequal to one.
 19. The method of claim 14, further comprising the step ofimposing a minimum variance criterion for a reverberant case over alllinear filtering combinations of X₁ and X₂.
 20. A program storage devicereadable by machine, tangibly embodying a program of instructionsexecutable by the machine to perform method steps for separating atleast two audio channels recorded using an array of at least twomicrophones, the method steps comprising: equalizing variances of afirst channel and a second channel on a current data frame; recursivelyexpressing means and variances of mixtures; and normalizing the secondchannel to a variance level substantially similar to a variance of thefirst channel.